Predictive Modeling With Big Data: Is Bigger Really Better?

نویسندگان

  • Enric Junqué de Fortuny
  • David Martens
  • Foster Provost
چکیده

With the increasingly widespread collection and processing of "big data," there is natural interest in using these data assets to improve decision making. One of the best understood ways to use data to improve decision making is via predictive analytics. An important, open question is: to what extent do larger data actually lead to better predictive models? In this article we empirically demonstrate that when predictive models are built from sparse, fine-grained data-such as data on low-level human behavior-we continue to see marginal increases in predictive performance even to very large scale. The empirical results are based on data drawn from nine different predictive modeling applications, from book reviews to banking transactions. This study provides a clear illustration that larger data indeed can be more valuable assets for predictive analytics. This implies that institutions with larger data assets-plus the skill to take advantage of them-potentially can obtain substantial competitive advantage over institutions without such access or skill. Moreover, the results suggest that it is worthwhile for companies with access to such fine-grained data, in the context of a key predictive task, to gather both more data instances and more possible data features. As an additional contribution, we introduce an implementation of the multivariate Bernoulli Naïve Bayes algorithm that can scale to massive, sparse data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Big Data Analytics and Now-casting: A Comprehensive Model for Eventuality of Forecasting and Predictive Policies of Policy-making Institutions

The ability of now-casting and eventuality is the most crucial and vital achievement of big data analytics in the area of policy-making. To recognize the trends and to render a real image of the current condition and alarming immediate indicators, the significance and the specific positions of big data in policy-making are undeniable. Moreover, the requirement for policy-making institutions to ...

متن کامل

Towards Conceptual Predictive Modeling for Big Data Framework

Predictive modeling is the process of creating a statistical model from data with the purpose of predicting future behavior. In recent years, the amount of available data has increased exponentially and “Big Data Analysis” is expected to be at the core of most future innovations. Due to the rapid development in the field of data analysis, there is still a lack of consensus on how one should app...

متن کامل

Data-Enhanced Predictive Modeling for Sales Targeting

We describe and analyze the idea of data-enhanced predictive modeling (DEM). The term “enhanced” here refers to the case that the data used for modeling is sampled not from the true target population, but from an alternative (closely related) population, from which much larger samples are available. This leads to a “bias-variance” tradeoff, which implies that in some cases, DEM can improve pred...

متن کامل

MLBCD: a machine learning tool for big clinical data

BACKGROUND Predictive modeling is fundamental for extracting value from large clinical data sets, or "big clinical data," advancing clinical research, and improving healthcare. Machine learning is a powerful approach to predictive modeling. Two factors make machine learning challenging for healthcare researchers. First, before training a machine learning model, the values of one or more model p...

متن کامل

Big Data Efficiency, Information Waste and Lean Big Data Management: Lessons from the Smart Grid Implementation

Big data has become a popular buzzword today with the underlying assumption that bigger data is better. However, by its nature, big data comes with many challenges and environmental costs. In contrast to other research that has examined the benefits and costs of big data independently, our research-in-progress provides an integrated perspective. Theoretically, we draw on the perspective of lean...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Big data

دوره 1 4  شماره 

صفحات  -

تاریخ انتشار 2013